home *** CD-ROM | disk | FTP | other *** search
-
- From: clive@sco.com (Clive D.W. Feather)
- Subject: The Annotated Annotated C Standard
- Sender: clive@x.co.uk (Clive Feather)
- Date: Tue, 22 Mar 1994 20:22:46 GMT
- Organization: Santa Cruz Organization
- Lines: 804
- Xref: fido.asd.sgi.com comp.lang.c:76863 comp.std.c:11897
-
-
- The Annotated Annotated C Standard
- ==================================
- C.D.W.Feather
- =============
-
- This is a review of _The_Annotated_ANSI_C_Standard_, annotated by Herbert
- Schildt.
-
- This review is made possible by the generosity of Raymond Chen
- <raymondc@microsoft.com>, who provided the review copy of the book,
- and is dedicated to the Dream Inn, Santa Cruz, CA, whose staff
- supplied countless cups of coffee while I wrote this review.
-
-
- This version was modified on 1994-03-26. Thanks to the following for
- pointing out errors:
- Christopher R Volpe <volpe@ausable.crd.ge.com>
- Jutta Degener <jutta@cs.tu-berlin.de>
- Mark-Jason Dominus <mjd@saul.cis.upenn.edu>
- Sue Meloy <suem@hprpcd.rose.hp.com>
-
-
- Introduction
- ------------
- Since _The_Annotated_ANSI_C_Standard_ first appeared, many people have
- commented on errors in the book. After reading several of these, I obtained
- a copy of the book and have read it in its entirety.
-
- Many of these comments might appear to be relatively trivial. In response to
- this, I can only point out that the book is commenting on a very carefully
- designed document, and one that has to be read precisely. If the annotator
- cannot get things right, then the book is not just useless, but is a positive
- danger to those who do not have the time to read and analyse every word of the
- standard. In other contexts, such as a tutorial on C, some of the errors in
- this book could be allowed to pass, but not in this.
-
- When I state that no mention is made of a topic, this indicates that I feel
- that the topic is at least as important as ones that were commented on; quite
- often this refers to the features of the standard which are less easy to
- understand.
-
- Text quoted directly from the book is indicated by @@ in the left margin.
-
-
- General comments
- ----------------
- Quite often, the book gives the impression that annotations were omitted
- because they couldn't be fitted into the format of "standard on the left,
- comments on the right". Whilst many pages of the standard have no annotations
- at all, there are no pages with annotation but no standard. I note at least
- one case below where I believe that a function was not annotated because the
- comments on the previous section took up too much space.
-
- The front cover of the book shows, amongst much clutter and someone's
- half-eaten muffin, page 147 of the standard. It is intriguing to note that,
- not only is this the obsolete ANSI standard rather than the ISO standard,
- but that it corresponds to half of page 146 in the book.
-
- The major divisions of the standard are referred to as "Part 1", "Part 2", etc.
- In actual fact, they are "clause 1", "clause 2", and so on. One has to wonder
- about an author who can't even get that right.
-
-
- Specific comments
- -----------------
- Numbers at the start of each comment are the ISO subclause numbers of
- Schildt's annotations, which are not always the same as the subclause actually
- being annotated.
-
- 3.10, 3.16, 3.17
- A proper understanding of the terms "implementation-defined", "undefined",
- and "unspecified", and of the differences between them, is essential to
- understanding the limits that the standard puts on the programmer and the
- implementor. Unfortunately, the differences are not explained at all, and
- the book leaves me wondering why the different terms are used at all.
-
- 3.13
- @@ However, this limits the total character set to 255 characters.
- Actually, it limits it to UCHAR_MAX characters, which is at least 255, but
- can be more. There was an opportunity here to explain what multibyte characters
- actually are, but it seems to have been missed, possibly because of the lack
- of space.
-
- 3.14
- @@ An object is either a variable or a constant that resides at a physical
- @@ memory address.
- In C, a constant does not reside in memory (except for some string literals)
- and so is not an object.
-
-
- 5.1.1.3
- The standard is clear that diagnostics are required when syntax rules and
- constraints are violated, and are optional otherwise. This is not covered at
- all. Instead we get the vague statement that
- @@ The standard requires that a compiler issue error messages when an error
- @@ in the source code is encountered.
- without discussing the different kinds of errors.
-
- 5.1.2.2
- @@ You are therefore free to declare main() as required by your program.
- This statement is immediately followed by the example:
- void main (void)
- even though the text of the standard directly opposite states that this is
- undefined. Indeed, the text I quote makes me wonder whether Schildt believes
- that:
- struct foo { int i; double d; } main (double argc, struct foo argv)
- is permitted !
-
- Most of the examples in the book declare main() as void. I won't bother to
- point them out individually.
-
- 5.1.2.2.1
- @@ Though most compilers will automatically return 0 when no other return
- @@ value is specified (even when main() is declared as void), you should
- @@ not rely on this fact because it is not guaranteed by the standard.
- Indeed it is not. If main() is declared as void, I don't know of any compiler
- that will return 0. Indeed, the standard forbids it to !
-
- 5.1.2.3
- This section is often called the "as if" rule, because it says that an
- implementation may do anything providing that the effect is "as if" the
- exact wording of the standard was followed. This is almost completely ignored
- in favour of explaining "side effect" and "automatic storage".
-
- 5.2.1.2
- @@ Therefore, a multibyte character is a character that requires more than
- @@ one byte.
- Ignoring the fact that "character" and "byte" are synonymous in the standard
- (something that is not mentioned in the annotations), the definition of
- multibyte character is clear that it *does* include single byte characters.
-
- @@ First, the null character may not be used except in the first byte of a
- @@ multibyte sequence.
- I read this as meaning that the multibyte character <00><94> is legal while
- the multibyte character <94><00> is not. In actual fact, the standard states
- that a zero byte must not appear in *any* multibyte character other than the
- null character (i.e. the end of string indicator). This means that string
- operations such as strcpy will work as expected with multibyte character
- sequences.
-
- There was an opportunity here to explain multibyte characters and how to
- use them, something that most books omit. Unfortunately, this one omits it
- as well.
-
- 5.2.3
- @@ In other words, one copy of a library function in memory may not be used
- @@ by two or more currently executing programs.
- This is blatant nonsense - on most Unix systems, if the same program is
- executing several times, all the code is shared by both processes. Indeed,
- many go further and share one copy of the standard C library among every
- process on the system.
-
- What this section of the standard is talking about is re-entrancy. The
- functions in the library are not re-entrant, and so may not be called from
- within themselves. For example:
- * qsort() cannot be called from within the compare function passed to qsort();
- * if a signal can be raised within a library function (perhaps by an external
- event such as the user pressing a BREAK key), then the signal handler must
- not call that library function.
- The latter rule is particularly important: code using malloc must not call
- malloc from within signal handlers.
-
- 5.2.4.1
- @@ A compound statement is a block of code.
- A nice sounding statement, but totally meaningless. A compound statement is
- a block of code beginning with { and ending with the matching }. For example,
- the body of a function is a compound statement.
-
- 5.2.4.2
- @@ First, notice that a character is defined as 8 bits (1 byte). All other
- @@ types may vary in size, but in C a character is always 1 byte long.
- Certainly a character is always 1 byte long, since that is what a byte is
- defined as. However, nowhere does the standard require a byte to be 8 bits;
- an implementation with 47-bit bytes can conform to the standard.
-
- The assumption that 1 byte = 8 bits occurs at several other points in the book.
- I won't always bother to point it out.
-
- 6.1
- The book carefully talks about tokens, and then proceeds to mention
- preprocessing tokens, while totally failing to note the difference, or why
- both concepts exist. I would have thought that this was exactly the sort of
- thing annotation was all about.
-
- 6.1.1
- @@ No other keywords are allowed in a conforming program.
- False. Other keywords are allowed, providing that they either occupy the
- implementation namespace (such as "__far"), or that they are only used after
- inclusion of a non-standard system header. For example, a compiler could
- state that, following "#include <8086.h>", "far" is a keyword. Since no
- strictly conforming program can include that header, and providing that "far"
- is not treated specially without it, such a compiler would conform to the
- standard.
-
- Of course, no other keywords are allowed in a *strictly* conforming program.
-
- 6.1.2
- If one is going to mention that only the first six characters of external
- names are significant, one should also mention that the case of those six
- characters is not.
-
- 6.1.2.1
- @@ * File scope begins with the beginning of the file and ends with the end
- @@ of the file
- @@ * Block scope begins with the opening { of a block and ends with its
- @@ associated closing }.
- This is not true: while the scopes end as described, they begin, for each
- identifier, at the end of its "declarator" (that is, at the comma, equals
- sign, or semicolon after it is declared). This is particularly important for
- idenfifiers with block scope. Consider this code:
-
- /* Line 1 */ {
- /* Line 2 */ int i = 10;
- /* Line 3 */ {
- /* Line 4 */ int j = i;
- /* Line 5 */ int i = 5;
- /* Line 6 */ printf ("i = %d, j = %d\n", i, j);
- /* Line 7 */ }
- /* Line 8 */ }
-
- All three variables have block scope, but they are different:
- outer i: from the "=" on line 2 to the "}" on line 8
- inner i: from the "=" on line 5 to the "}" on line 7
- j: from the "=" on line 4 to the "}" on line 7
- In particular, the "i" on line 4 refers to the one in the outer block, and
- so j has the value 10, not 5.
-
- 6.1.2.2
- @@ Identifiers with external linkage are accessible by your entire program
- Once again this is in error - for example, an identifier with external linkage
- is not accessible in a translation unit that uses the same name with internal
- linkage. The point of linkage is to indicate when the same identifier refers
- to the same object, yet the annotations omit this entirely.
-
- 6.1.2.3
- There is no mention of the fact that each structure and union type has its
- own namespace, so that more than one structure or union can have a field
- with a given name.
-
- 6.1.2.5
- @@ An unsigned integer expression cannot overflow. This is because there is
- @@ no way to represent such an overflow as an unsigned quantity.
- More nonsense. An implementation either does or doesn't have a way to represent
- overflow - usually integers don't, while floating point may or may not (some
- systems have INFINITY values that effectively indicate overflow). However,
- an unsigned integer expression cannot overflow because the standard says so -
- the choice was made that unsigned integer arithmetic is done modulo some
- base (UINT_MAX+1 for unsigned int, ULONG_MAX+1 for unsigned long). There is
- no magic about this; it was an arbitrary decision by the authors of the
- standard.
-
- 6.1.3.4
- @@ x = 'A'; /* give x the value 65 */
- This comment, and the following text, leave the reader believing that 'A'
- must have the value 65, and by extension that C requires the use of ASCII
- codes. This is of course false, but it would be hard to tell from the book.
-
- This, plus the comments assuming 8-bit bytes, and use of the terms "high byte"
- and "low byte" of integers later on, makes me wonder whether a better title
- for the book is: _The_ANSI_C_Standard_annotated_for_some_MSDOS_compilers_ :-).
-
- 6.1.4
- @@ In other words, the executable version of a C program contains a table
- @@ that contains the string literals used by the program.
- While this is one way to implement strings, it is not the only one. Such a
- comment does not belong in a book like this.
-
- @@ Further, the effect of changing the string literal table is implementation
- @@ dependent. The best practice is to avoid altering the string table.
- It's more than just implementation dependent (a term which, by the way, is not
- used by the standard), it's completely undefined. You *must* *not* modify a
- string literal.
-
- 6.2.1.2
- A description which is essentially correct is spoilt by the addition of the
- words:
- @@ In the most general terms, when you convert from a larger integer type to
- @@ a smaller type, high-order bytes are lost.
- When an integer value is converted to a signed type which can't hold that
- value, the result need not be that given by removing some bits. For example,
- a rule that converted all such values to the minimum value of the destination
- type (SCHAR_MIN, SHORT_MIN, INT_MIN) would be conforming.
-
- A simpler way to state what this section means is:
- * If the source value can be represented in the destination type, it is
- unaltered.
- * Otherwise, if the destination type is unsigned, reduce the value modulo
- U<type>_MAX+1.
- * Otherwise the destination type is signed and the value is implementation
- defined.
-
- 6.2.1.4
- @@ When converting a larger [floating] type into a smaller one, if the value
- @@ cannot be represented, information content may be lost.
- Actually, unlike integers, such conversions are undefined, and the program
- may crash as a result.
-
- 6.2.1.5
- @@ these automatic conversions are also intuitive.
- These conversions have been the subject of much debate. This section would
- benefit from a proper explanation of the "value preserving" rules, and why
- they were chosen.
-
- 6.2.2.1
- @@ First, an array name without an index is a pointer to the first element of
- @@ the array and is not an lvalue.
- This has to be one of the worst expressions of the Rule I have ever seen !
- First, there are a number of contexts (such as sizeof) where an array name
- does *not* get changed to a pointer. Second, if the decay to a pointer takes
- place at all, it takes place whether or not there is an index; for example,
- decay takes place when the array name is used as a function argument. Last,
- an array name *is* an lvalue; it is the resulting pointer that is not.
-
- 6.2.2.3
- Considering how often they are used, the rather peculiar way they are
- specified, and the need to cast them in some contexts but not others, it is
- odd that null pointer constants are not mentioned at all.
-
- 6.3
- @@ The standard states that when an expression is evaluated, each object's
- @@ value is modified only once. In theory, this means the compiler will not
- @@ physically change the value of a variable in memory until the entire
- @@ expression has been evaluated. In practice, however, you may not want to
- @@ rely on this.
- The book then in effect goes on to say that "i = ++i + 1" is usually compiled
- as if it were "i += 2".
-
- As anyone who has survived the "i = i++" thread on comp.lang.c knows, this is
- not only nonsense, but dangerous nonsense. The correct way to discuss this
- part of the standard is to point out what can and can't be done in a strictly
- conforming program, and leave it at that. Suggesting that such code can ever
- have a defined answer is asking for trouble.
-
- @@ The rest of this section formally defined what type of lvalue can refer to
- @@ an object.
- Well, in one sense this is true. However, what is important is *why* only some
- lvalues can refer to a given object, and the annotations completely skip this.
- The reason is, of course, to indicate when a compiler can assume that two
- identifiers refer to the same object. For example, in:
-
- char *cp;
- int *ip;
-
- void f (double *d)
- {
- *d = 3.14159;
- *cp = 1;
- *ip = 2;
- }
-
- The rules of this section say that the assignment to *cp could potentially
- alter *d, and the compiler must generate code that takes that into account,
- but the assignment to *ip cannot, and the compiler may assume that *d and *ip
- do not overlap. This is called "aliasing", and knowing when aliasing takes
- effect is an important factor in correctly optimising code.
-
- 6.3.2.2
- @@ When no prototype for a function exists, it is not an error if the types
- @@ and/or number of parameters and arguments differ. The reason for this
- @@ seemingly strange rule is to provide compatibility with older C programs in
- @@ which prototypes do not exist.
- On the contrary, when no prototype exists, the number of arguments to a call
- must be the same as the number of parameters in the function (which cannot
- be a varargs function), and the types must be compatible after promotion.
- What should have been written is that no error message is required if these
- rules are broken.
-
- 6.3.2.3
- Though this section mentions the existence of the "common initial subsequence"
- rule for unions, it does not explain it properly, nor does it mention that
- in all other circumstances assigning to one element of a union makes all other
- elements have undefined values.
-
- 6.3.6
- There is no mention of the rule that addition and subtraction of pointers and
- integers must yield a pointer to the same array or one past the end of the
- array.
-
- 6.3.7
- @@ When right-shifting a negative value, generally, ones are shifted in (thus
- @@ preserving the sign bit), but this is implementation dependent.
- The result of signed right shift of a negative number is implementation
- defined; there is no suggestion in the standard that shifting in ones is the
- "best" thing to do.
-
- 6.3.13
- There is no mention of the fact that && and || evaluate explicitly left to
- right, and stop when the result is known. This would be an opportunity to
- discuss sequence points, but the opportunity is missed.
-
- 6.3.16.2
- When talking about compound assignments (+= etc.), the annotations mention
- that "a += b" means the same as "a = a + b", but do not point out that the
- two are not equivalent; for example, "*a++ *= 2" is strictly conforming code
- which increments a once, while "*a++ = *a++ * 2" is not.
-
- 6.3.17
- Again, there is no mention of sequence points.
-
- 6.5
- @@ In simple language, a declarator is the name of the object being declared.
- In real C, a declarator is everything about the type and name of the object
- except the basic type and storage class. For example, in "static int *p[5];",
- the declarator is "*p[5]", and includes the concepts of pointer, array, and
- size of array as well as the name.
-
- 6.5.1
- @@ A variable declared using extern is not a definition.
- Not only is this wrong, but the annotations to 6.7.2 directly contradict it,
- with the correct example of "extern int count = 10;".
-
- @@ In essence, a static local variable is a global variable with its scope
- @@ restricted to a single function.
- Actually, a static local variable is a global variable with its scope
- restricted to some block scope; that is, from the end of its declarator to
- the closing } of the block it is declared in.
-
- @@ When static is applied to a global variable or function, it causes that
- @@ variable or function to have file scope
- The global variable or function has file scope whether or not static is
- applied to it. The static keyword causes it to have internal linkage, which
- is a different matter.
-
- @@ The register specifier is only a request to the compiler, which may be
- @@ completely ignored.
- It can't be completely ignored, because whether or not it affects the way in
- which the variable is implemented, it is still illegal to take the address of
- an object declared register.
-
- 6.5.2.1
- There is no mention of the implementation-defined aspects of bit fields.
-
- @@ This padding must occur at the end, not at the beginning, of the object.
- Padding can occur anywhere except at the beginning of a structure. In
- particular, it can occur between two fields. Of course a union can only be
- padded at the end.
-
- 6.5.3
- @@ (Many compilers display a warning about this fragment, but still accept
- @@ it.)
- @@ const int i = 10;
- @@ int *p;
- @@ p = &i;
- @@ *p = 0; /* modify a const object through p */
- Actually, the standard requires a diagnostic for the third line, because
- it violates the third dashed item of the constraints of 6.3.16.1. If an
- explicit cast had been used in that line, I believe that the assignment
- would be strictly conforming. If so, then it is true that the standard does
- not require a diagnostic for the last line, but nevertheless it is undefined,
- not just something to warn about.
-
-
- 6.5.4
- @@ The information and constraints in this section are mostly applicable
- @@ to compiler implementors.
- Since this section defines how to declare arrays, pointers, and procedure
- prototypes, one has to wonder what the author actually considers interesting !
-
- 6.5.4.3
- Considering that it has come up in at least two Defect Reports, I would have
- expected some mention of the rule about typedef names within prototypes.
-
- 6.5.5
- A useful way to think of a type name is as a declaration with the identifier
- being declared omitted. So, for example, if v is declared as:
- unsigned char *v[5];
- then the type of v is:
- unsigned char *[5];
-
- 6.5.7
- @@ The general form of an initialization is
- @@ type var = initializer;'
- Once again, the whole concept of declarators is omitted. While it is true that
- that is one form of an initialization, it excludes lines like:
- int a [5] = { 1, 2, 3, 4, 5 };
-
- 6.6.4.2
- This is another example where the annotations describe a "general form"
- which isn't. In this case, it implies that the "default" case label must be
- the last one in the switch, and that it can't have an associated "break".
- The problem with these "general" forms is that, while they are fine in a
- teaching context, they omit all the grubby details that a user of the standard
- needs to know, such as fall-through cases, or Duff's Device.
-
- I would also have appreciated a warning that ordinary labels are still allowed
- within the body of a switch statement, so:
- switch (i)
- {
- /* ... */
- defualt:
- j = 0;
- break;
- }
- is legal code, but is *not* the default case of the switch.
-
- 6.7.1
- @@ To understand the difference between the modern and old forms, here is the
- @@ same function defined using both forms:
- @@ /* Modern function definition. */
- @@ float f (int a, char c)
- @@ {
- @@ /* ... */
- @@ }
- @@ /* Old-form function definition. */
- @@ float f (a, c)
- @@ int a;
- @@ char c;
- @@ {
- @@ /* ... */
- @@ }
-
- Unfortunately, these two aren't exactly the same. With the modern function
- definition, the argument corresponding to c is converted to type char and
- passed to the function. With the old-form definition, it is converted to int,
- passed to the function as an int, and then converted to char.
-
- Why does this matter, you may ask ? Well, it matters when we're trying to
- write a prototype for the function. The prototype for the new form definition
- is:
- float f (int a, char c);
- as you might expect. However, the prototype for the old form is:
- float f (int a, int c);
-
- 6.8.2
- @@ The #include statement has these two forms:
- Actually, it has three forms. While the third is fairly uncommon, it ought at
- least to be acknowledged.
-
- 6.8.3
- Probably just a typographical error, but the expansion near the bottom of the
- page is:
- printf ("%d ", ABS (((-20) < 0 ? -(-20) : (-20)));
- and should be:
- printf ("%d ", ((-20) < 0 ? -(-20) : (-20)));
-
- 6.8.6
- There is no mention of the fact that using any #pragma in a translation
- unit (this means after #ifdef'd-out code has been removed) prevents it from
- being strictly conforming.
-
-
- 7.1.2
- The title of this subclause is: "Standard headers", but the annotations
- begin with: "A header file". This obscures the fairly important point that
- the standard headers need not be files; there is at least one implementation
- where the effect of the standard headers is known by the compiler, and there
- are no such files at all.
-
- @@ All conforming C compilers will supply all of the functions described here.
- This only applies to "hosted" implementations, and is not true for
- "freestanding" implementations.
-
- 7.1.3
- @@ Frankly, many C programmers are not aware of the rules described in
- @@ this section.
- Quite right ! Unfortunately, the chance to explain the rules was missed.
-
- 7.1.4
- @@ If errno is zero, then no error has been detected.
- This isn't true at all. No library function will ever set errno to zero, but
- if it *is* zero before one is called, it can remain zero even if an error does
- occur.
-
- 7.1.6
- The annotations include an example of offsetof(). Unfortunately, the
- explanation of this example assumes that there is no padding in the structure.
- If structures had no padding, offsetof() wouldn't be needed because the offset
- of a field could be computed from the sizes of the preceeding fields.
-
- 7.3
- These functions are nearly all locale dependent: whether a character is a
- letter depends on the language in use as well as the character set.
- Unfortunately, the opportunity to explain this has been omitted in favour of
- a long example printing lines like:
- @@ x is alphanumeric
-
- 7.4.1.1
- @@ The setlocale() function sets all or a specified portion of those items
- @@ described in the lconv structure
- This is true in one sense, but oh so misleading. The setlocale() function
- alters the meaning of many of the functions in the standard. For example, it
- can change which characters are letters, or it can alter the decimal point
- character. It can also affect the order in which strcoll() sorts strings.
- In all, there are five "categories" that it can affect. The lconv structure
- is affected by two of these, but not the other three, and it is not the only
- thing that these two affect.
-
- 7.6
- The example calls setjmp() using the statement:
- @@ result = setjmp (jumpbuf);
- Unfortunately, the standard puts strict limits on the places in which setjmp
- can be called; essentially it must be one of the four forms:
- while (setjmp (jumpbuf))
- while (setjmp (jumpbuf) < 42)
- while (!setjmp (jumpbuf))
- setjmp (jumpbuf);
- [The "while" may be replaced by "if" or "switch", or may be the implicit
- while of a "for" statement.]
- The example in the annotations, however, doesn't use any of these forms, and
- so the compiler must produce a diagnostic for this code.
-
- The standard also puts limitations on what can be done with local variables
- in functions that call setjmp(). I am surprised to find no mention of these
- limitations at all.
-
- 7.7
- There is no mention of the type sig_atomic_t, and when it should be used.
-
- 7.9
- @@ The type fpos_t is some type of an unsigned integer.
- Actually, not only is there no such requirement in the standard, but fpos_t
- was designed for the circumstances when a file position *can't* be fitted
- into an unsigned long. The forthcoming Normative Addendum 1 also puts further
- requirements on fpos_t which, while compatible with the current standard, can
- *not* be implemented if it is an unsigned integer.
-
- 7.9.2
- @@ Thus, it is permissible for a text stream to treat all characters as part of
- @@ one long, uninterrupted line, if it so chooses.
- Fine sounding words. I wish I knew what they mean !
-
- The standard states that an implementation may treat spaces at the end of lines
- in text files specially, and may add and remove zero bytes at the end of binary
- files. Neither of these rules are mentioned.
-
- 7.9.5.2
- There is no mention of fflush(NULL), nor that fflush cannot be applied to
- an input stream.
-
- 7.9.6.1
- @@ Note that if stream is a pointer to stdout,
- Just a nit, but stdout is a pointer, and stream cannot point to it.
-
- Here, and in many other places, printf() is called with a format of "%lf" and
- a corresponding argument which is a double. Unfortunately, the standard states
- that "%f" is the correct format for a double, and "%lf" is undefined. This is
- a particularly bad sin because the description of the "l" flag is missing
- (left page 132 of the book is a repeat of page 131).
-
- While I cannot of course just copy the missing text, I can summarise what has
- been lost.
-
- After the % appear:
- - optional flags
- - an optional field width
- - an optional precision
- - an optional "h", "l", or "L"
- - a character specifying the conversion type
-
- The flags are:
- - minus: left justify the conversion within the field width, instead of right
- justify;
- - plus: any signed conversion will begin with a plus or minus sign;
- - space: any signed conversion which does not begin with a sign will be
- prefixed with a space; this is ignored if the plus flag also appears;
- - hash: for "o" conversion, ensure that the first digit is a zero;
- for "x" conversion, a non-zero result will begin with "0x";
- for "X" conversion, a non-zero result will begin with "0X";
- for floating-point conversions, there will always be a decimal point
- even if no digits follow it;
- for "g" and "G" conversion, trailing zeroes are retained;
- hash may not appear on any other conversions;
- - zero explained on page 133.
-
- The field width is an asterisk or a decimal integer. If the converted value
- has fewer characters that the field width, it is padded to the width (unless
- altered by the flags, the padding is with spaces on the left). The field width
- cannot reduce the width of the converted value. Note that a zero at the start
- of the width is the "0" flag; it does not mean that the width is in octal.
-
- The precision is a dot followed by an asterisk, a decimal integer, or nothing
- (equivalent to zero). It can only appear with certain conversions, and its
- meaning varies:
- d, i, o, u, x, X: the minimum number of digits to appear
- e, E, f: the number of digits after the decimal point
- g, G: the maximum number of signficant digits
- s: the maximum number of characters to be taken from the string
- It must not appear with any other conversion.
-
- If the width, precision, or both, is an asterisk, the actual value is taken
- from an int argument to the fprintf() function. The arguments are always in
- the order:
- * width if an asterisk
- * precision if an asterisk
- * actual value to be converted
- A negative width means add the "-" flag and use the absolute value; a negative
- precision means that the precision should be treated as if omitted.
-
- The optional letters may appear as follows:
- h with d, i, o, u, x, X:
- the argument value (which is int or unsigned int) will be converted to
- short or unsigned short before printing;
- h with n:
- the argument is a (short *) rather than a (int *);
- l with d, i, o, u, x, X:
- the argument is long or unsigned long
- l with n:
- the argument is a (long *) rather than a (int *);
- L with e, E, f, g, G:
- the argument is a long double rather than a double.
- These letters may not appear with any other conversion.
-
- 7.9.6.2
- Both the examples of scansets don't use a field width. This means that if the
- user inputs a line which is too long, it will overflow the buffer with
- potentially disasterous results. They also use "fflush(stdin)", which is
- undefined.
-
- 7.9.7
- The annotations talk about "high order byte" and "low order byte" as if an
- integer only has two bytes. In any case, these functions are not defined in
- terms of "bytes", but in terms of conversion to unsigned char.
-
- The first example calls fgetc() and assigns the result to a char variable.
- This means that an error or end-of-file will cause the program to loop forever.
-
- 7.9.10.2
- @@ The following fragment illustrates how files are commonly read:
- @@ do {
- @@ ch = fgetc (fp);
- @@ /* ... */
- @@ } while (!feof (fp));
- This example suffers from the "Pascal disease". The function feof() does not
- mean "end of file has been reached", but means "a previous read hit end of
- file and returned EOF". Thus, when the last character of the file is read and
- processed, "feof (fp)" will still be false, and the loop will be repeated one
- more time. This time, ch will be set to EOF, but there is no indication in
- the annotations that this must be treated specially. Only after this EOF has
- been processed, probably wrongly - for example, if the file is being copied
- to somewhere else, a spurious character will be output - will the call to
- feof() return true.
-
- 7.9.10.3
- @@ Also, for files opened for binary operations, EOF is a valid binary value
- @@ and does not necessarily indicate an error or end-of-file condition.
- This is dangerous nonsense, caused because the annotations use char variables
- instead of ints to hold the results of fgetc(). What the standard says is, in
- effect, that fgetc() returns a positive or zero value if it read a character,
- and a negative value (EOF) if it reached end-of-file or an error occurred.
-
- It is true that EOF, cast to the type unsigned char, is identical to value
- that can be read from a binary file (or even a text file). However, this is
- just the effect of bad programming; anyone with experience in C file handling
- should be aware of this.
-
- 7.10.1
- @@ Also, remember that if the string does not contain a valid numeric value
- @@ as defined by the function, then 0 is returned. Although strtod(),
- @@ strtol(), and strtoul() set errno when an out-of-range condition exists,
- @@ there is no requirement that errno be set when the string does not contain
- @@ a number. Thus, if this is important to your program, you must manually
- @@ check for the presence of a number before calling one of the conversion
- @@ functions.
- Actually, it is quite hard to make such a check, but luckily it is also
- unnecessary. If there is no number in the string, all three functions set
- *endptr to the original value of nptr, while if there is (even if it is zero)
- they set it to point after the last character of the number.
-
- 7.10.2
- The example appears to assume that time_t is an integral type, and so assigns
- the result of time() to a long. In fact, it could be double, and it might be
- that the cast always yields zero. To extract a random number from the value
- returned by time(), it is necessary to do something like the following, which
- constructs an unsigned int from all the bits of a time_t value.
-
- unsigned int random_from_time (time_t t)
- {
- unsigned int i, j, k;
- char *p;
-
- i = 0;
- p = (char *) &t;
- /* Divide t up into pieces each the size of an unsigned int */
- for (k = 0; k + sizeof j <= sizeof t; k += sizeof j)
- {
- /* Copy the bits of the piece into j and add the value to i */
- memcpy ((char *) &j, p + k, sizeof j);
- i += j;
- }
- /* Do the same with any remnant (e.g. if j is 4 bytes and t is 11) */
- if (k < sizeof t)
- {
- j = 0;
- memcpy ((char *) &j, p + k, sizeof t - k);
- i += j;
- }
- return i;
- }
-
- 7.10.7 and 7.10.8
- @@ Since multibyte characters are implementation-specific, you should refer
- @@ to your compiler's user manual for details.
- There is a lot that can be said about multibyte and wide characters without
- having to know individual encodings, and there is a sore lack of such
- tutorial material. It is a great pity to be faced with two almost blank pages
- instead.
-
- 7.11.4
- The annotations use the term "rearrange" when discussing strxfrm(). It should
- be noted that the result of strxfrm may be longer than the original string.
-
- The example compares two arrays of floats using memcmp. While such a comparison
- is strictly conforming, it is not useful - the result of the comparison depends
- on the details of the encoding of floats, and is in no way related to which
- number is greater or smaller. (For example, it is possible to have an encoding
- in which 0 < 2, but 2 > 3, as far as this comparison works. In the same way,
- comparing integers with memcmp is equally useless on a little-endian system.)
-
- 7.12.2.3
- There is no description of mktime and how it can be used to solve problems
- like "what day is 100 days after December 25th 1993". This appears to be solely
- because there was no room on the page opposite the definition of mktime.
-
-
- [END]
-
- --
- Clive D.W. Feather | Santa Cruz Operation | If you lie to the compiler,
- clive@sco.com | Croxley Centre | it will get its revenge.
- Phone: +44 923 816 344 | Hatters Lane, Watford | - Henry Spencer
- Fax: +44 923 210 352 | WD1 8YN, United Kingdom |
-